72 research outputs found

    Towards a Unifying Framework for Tuning Analysis Precision by Program Transformation

    Get PDF
    Static and dynamic program analyses attempt to extract useful information on program’s behaviours. Static analysis uses an abstract model of programs to reason on their runtime behaviour without actually running them, while dynamic analysis reasons on a test set of real program executions. For this reason, the precision of static analysis is limited by the presence of false positives (executions allowed by the abstract model that cannot happen at runtime), while the precision of dynamic analysis is limited by the presence of false negatives (real executions that are not in the test set). Researchers have developed many analysis techniques and tools in the attempt to increase the precision of program verification. Software protection is an interesting scenario where programs need to be protected from adversaries that use program analysis to understand their inner working and then exploit this knowledge to perform some illicit actions. Program analysis plays a dual role in program verification and software protection: in program verification we want the analysis to be as precise as possible, while in software protection we want to degrade the results of analysis as much as possible. Indeed, in software protection researchers usually recur to a special class of program transformations, called code obfuscation, to modify a program in order to make it more difficult to analyse while preserving its intended functionality. In this setting, it is interesting to study how program transformations that preserve the intended behaviour of programs can affect the precision of both static and dynamic analysis. While some works have been done in order to formalise the efficiency of code obfuscation in degrading static analysis and in the possibility of transforming programs in order to avoid or increase false positives, less attention has been posed to formalise the relation between program transformations and false negatives in dynamic analysis. In this work we are setting the scene for a formal investigation of the syntactic and semantic program features that affect the presence of false negatives in dynamic analysis. We believe that this understanding would be useful for improving the precision of existing dynamic analysis tools and in the design of program transformations that complicate the dynamic analysis

    Semantics-based software watermarking by abstract interpretation

    Get PDF
    Software watermarking is a software protection technique used to defend the intellectual property of proprietary code. In particular, software watermarking aims at preventing software piracy by embedding a signature, i.e. an identier reliably representing the owner, in the code. When an illegal copy is made, the owner can claim his/her identity by extracting the signature. It is important to hide the signature in the program in order to make it dicult for the attacker to detect, tamper or remove it. In this work we present a formal framework for software watermarking, based on program semantics and abstract interpretation, where attackers are modeled as abstract interpreters. In this setting we can prove that the ability to identify signatures can be modeled as a completeness property of the attackers in the abstract interpretation framework. Indeed, hiding a signature in the code corresponds to embed it as a semantic property that can be retrieved only by attackers that are complete for it. Any abstract interpreter that is not complete for the property specifying the signature cannot detect, tamper or remove it. We formalize in the proposed framework the major quality features of a software watermarking technique: secrecy, resilience, transparence and accuracy. This provides an unifying framework for interpreting both watermarking schemes and attacks, and it allows us to formally compare the quality of dierent watermarking techniques. Indeed, a large number of watermarking techniques exist in the literature and they are typically evaluated with respect to their secrecy, resilience, transparence and accuracy to attacks. Formally identifying the attacks for which a watermarking scheme is secret, resilient, transparent or accurate can be a complex and error-prone task, since attacks and watermarking schemes are typically dened in dierent settings and using dierent languages (e.g. program transformation vs. program analysis), complicating the task of comparing one against the others

    Software Watermarking: A Semantics-based Approach

    Get PDF
    Software watermarking is a defence technique used to prevent software piracy by embedding a signature, i.e., an identifier reliably representing the owner, in the code. When an illegal copy is made, the ownership can be claimed by extracting this identifier. The signature has to be hidden inside the program and it has to be difficult for an attacker to detect, tamper or remove it. In this paper we show how the ability of the attacker to identify the signature can be modelled in the framework of abstract interpretation as a completeness property. We view attackers as abstract interpreters that can precisely observe only the properties for which they are complete. In this setting, hiding a signature in the code corresponds to inserting it in terms of a semantic property that can be retrieved only by attackers that are complete for it. Indeed, any abstract interpreter that is not complete for the property specifying the signature cannot detect, tamper or remove it. The goal of this work is to introduce a formal framework for the modelling, at a semantic level, of software watermarking techniques and their quality features

    Testing android malware detectors against code obfuscation: a systematization of knowledge and unified methodology

    Get PDF
    The authors of mobile-malware have started to leverage program protection techniques to circumvent anti-viruses, or simply hinder reverse engineering. In response to the diffusion of anti-virus applications, several researches have proposed a plethora of analyses and approaches to highlight their limitations when malware authors employ program-protection techniques. An important contribution of this work is a systematization of the state of the art of anti-virus apps, comparing the existing approaches and providing a detailed analysis of their pros and cons. As a result of our systematization, we notice the lack of openness and reproducibility that, in our opinion, are crucial for any analysis methodology. Following this observation, the second contribution of this work is an open, reproducible, rigorous methodology to assess the effectiveness of mobile anti-virus tools against code-transformation attacks. Our unified workflow, released in the form of an open-source prototype, comprises a comprehensive set of obfuscation operators. It is intended to be used by anti-virus developers and vendors to test the resilience of their products against a large dataset of malware samples and obfuscations, and to obtain insights on how to improve their products with respect to particular classes of code-transformation attacks

    Infections as Abstract Symbolic Finite Automata: Formal Model and Applications.

    Get PDF
    In this paper, we propose a methodology, based on machine learning, for building a symbolic finite state automata based model of infected systems, that expresses the interaction between the malware and the environment by combining in the same model the code and the semantics of a system and allowing to tune both the system and the malware code observation. Moreover, we show that this methodology may have several applications in the context of malware detection

    Code obfuscation and malware detection by abstract interpretation

    Get PDF
    Non disponibileAn obfuscating transformation aims at confusing a program in order to make it more difficult to understand while preserving its functionality. Software protection and malware detection are two major applications of code obfuscation. Software developers use code obfuscation in order to defend their programs against attacks to the intellectual property, usually called malicious host attacks. In fact, by making the programs more difficult to understand it is possible to obstruct malicious reverse engineering \u2013 a typical attack to the intellectual property of programs. On the other side, malware writers usually obfuscate their malicious code in order to avoid detection. In this setting, the ability of code obfuscation to foil most of the existing detection techniques, such as misuse detection algorithms, relies in their purely syntactic nature that makes malware detection sensitive to slight modifications of programs syntax. In the software protection scenario, researchers try to develop sophisticated obfuscating techniques that are able to resist as many attacks as possible. In the malware detection scenario, on the other hand, it is important to design advanced detection algorithms in order to detect as many variants of obfuscated malware as possible. It is clear how both malicious host and malicious code attacks represent harmful threats to the security of computer networks. In this dissertation, we are interested in both security issues described above. In particular, we describe a formal approach to code obfuscation and malware detection based on program semantics and abstract interpretation. This theoretical framework is useful in contrasting some well known drawbacks of software protection through code obfuscation, as well as for improving existing malware detection schemes. In fact, the lack of rigorous theoretical bases for code obfuscation prevents any possibility to formally study and certify their effectiveness in protecting proprietary programs. Moreover, in order to design malware detection schemes that are resilient to obfuscation we have to focus on program semantics rather than on program syntax. Our formal framework for code obfuscation relies on a semantics-based definition of code obfuscation that characterizes each program transformation T as a potential obfuscation in terms of the most concrete property preserved by T on program semantics. Deobfuscating techniques, and reverse engineering in general, usually begin with some sort of static program analysis, which can be specified as an abstraction of program semantics. In the software protection scenario, this observation naturally leads to model attackers as abstractions of program semantics. In fact, the abstraction modeling the attacker expresses the amount of information, namely the semantic properties, that the attacker is able to observe. It follows that, comparing the degree of abstraction of an attacker A with the one of the most concrete property preserved by an obfuscating transformation T, it is possible to understand whether obfuscation T defeats attack A. Following the same reasoning it is possible to compare the efficiency of different obfuscating transformations, as well as the ability of different attackers in contrasting a given obfuscation. We apply our semantics-based framework to a known control code obfuscation technique that aims at confusing the control flow of the original program by inserting opaque predicates. As argued above, an obfuscating transformation modifies a program while preserving an abstraction of its semantics. This means that different obfuscated versions of the same malware have to share (at least) the malicious intent, namely the maliciousness of their semantics, even if they may express it through different syntactic forms. The basic idea of our formal approach to malware detection is to use program semantics to model both malware and program behaviour, and semantic abstractions to hide the details changed by the obfuscation. Thus, given an obfuscation T, we are interested in defining an abstraction of program semantics that does not distinguish between the semantics of malware M and the semantics of its obfuscated version T(M). In particular, we provide this suitable abstraction for an interesting class of commonly used obfuscating transformations. It is clear that, given a malware detector D, it is always possible to define its semantic counterpart by analyzing how D works on program semantics. At this point, by translating both malware detectors and obfuscating transformations in the semantic world, we are able to certify which obfuscations a detector is able to handle. This means that our semanticsbased framework provides a formal setting where malware detectors designers can prove the efficiency of their algorithms

    Hunting Distributed Malware with the k-Calculus

    Get PDF
    The defense of computer systems from malicious software attacks, such as viruses and worms, is a key aspect of computer security. The analogy between malicious software and biological infections suggested us to use the k-calculus, a formalism originally developed for the analysis of biological systems, for the formalization and analysis of malicious software. By modeling the different actors involved in a malicious code attack in the k-calculus and by simulating their behavior, it is possible to extract important information that can drive in the choice of the defense technique to apply

    Revealing Similarities in Android Malware by Dissecting their Methods

    Get PDF
    One of the most challenging problems in the fight against Android malware is finding a way to classify them according to their behavior, in order to be able to utilize previously gathered knowledge in analysis and prevention. In this paper we introduce a novel technique that discovers similarities between Android malware samples by comparing fragments of executed traces (strands) generated from their most suspect methods. This way we can accurately pinpoint which (possibly) malicious behaviors are shared between these different samples, allowing for easier analysis and classification. We implement this approach in a tool, StrAndroid, that we evaluate on a few dataset of malware and ransomware samples, comparing its results to an existing similarity too

    Analyzing program dependences for malware detection.

    Get PDF
    Metamorphic malware continuously modify their code, while preserving their functionality, in order to foil misuse detection. The key for defeating metamorphism relies in a semantic characterization of the embedding of the malware into the target program. Indeed, a behavioral model of program infection that does not relay on syntactic program features should be able to defeat metamorphism. Moreover, a general model of infection should be able to express dependences and interactions between the malicious codeand the target program. ANI is a general theory for the analysis of dependences of data in a program. We propose an high order theory for ANI, later called HOANI, that allows to study program dependencies. Our idea is then to formalize and study the malware detection problem in terms of HOANI

    Partial (In)Completeness in Abstract Interpretation: Limiting the Imprecision in Program Analysis

    Get PDF
    Imprecision is inherent in any decidable (sound) approximation of undecidable program properties. In abstract interpretation this corresponds to the release of false alarms, e.g., when it is used for program analysis and program verification. As all alarming systems, a program analysis tool is credible when few false alarms are reported. As a consequence, we have to live together with false alarms, but also we need methods to control them. As for all approximation methods, also for abstract interpretation we need to estimate the accumulated imprecision during program analysis. In this paper we introduce a theory for estimating the error propagation in abstract interpretation, and hence in program analysis. We enrich abstract domains with a weakening of a metric distance. This enriched structure keeps coherence between the standard partial order relating approximated objects by their relative precision and the effective error made in this approximation. An abstract interpretation is precise when it is complete. We introduce the notion of partial completeness as a weakening of precision. In partial completeness the abstract interpreter may produce a bounded number of false alarms. We prove the key recursive properties of the class of programs for which an abstract interpreter is partially complete with a given bound of imprecision. Then, we introduce a proof system for estimating an upper bound of the error accumulated by the abstract interpreter during program analysis. Our framework is general enough to be instantiated to most known metrics for abstract domains
    • …
    corecore